A new generative adversarial network for medical images super resolution

For medical image analysis, there is always an immense need for rich details in an image. Typically, the diagnosis will be served best if the fine details in the image are retained and the image is available in high resolution. In medical imaging, acquiring high-resolution images is challenging and costly as it requires sophisticated and expensive instruments, trained human resources, and often causes operation delays. Deep learning based super resolution techniques can help us to extract rich details from a low-resolution image acquired using the existing devices. In this paper, we propose a new Generative Adversarial Network (GAN) based architecture for medical images, which maps low-resolution medical images to high-resolution images. The proposed architecture is divided into three steps. In the first step, we use a multi-path architecture to extract shallow features on multiple scales instead of single scale. In the second step, we use a ResNet34 architecture to extract deep features and upscale the features map by a factor of two. In the third step, we extract features of the upscaled version of the image using a residual connection-based mini-CNN and again upscale the feature map by a factor of two. The progressive upscaling overcomes the limitation for previous methods in generating true colors. Finally, we use a reconstruction convolutional layer to map back the upscaled features to a high-resolution image. Our addition of an extra loss term helps in overcoming large errors, thus, generating more realistic and smooth images. We evaluate the proposed architecture on four different medical image modalities: (1) the DRIVE and STARE datasets of retinal fundoscopy images, (2) the BraTS dataset of brain MRI, (3) the ISIC skin cancer dataset of dermoscopy images, and (4) the CAMUS dataset of cardiac ultrasound images. The proposed architecture achieves superior accuracy compared to other state-of-the-art super-resolution architectures.

The authors transformed the LR image to discrete cosine transform (DCT) domain and applied interpolation methods to estimate high frequencies. The interpolation-based methods for upscaling by a significant factor such as 6x or 8x typically result in blurred output images.
The reconstruction-based methods perform better than interpolation-based methods. Sun et al. 5 used the gradient profile method to map LR images to HR images. The authors used a non-local mean filter to extract gradient profile information, which was used to map the LR images to HR images. Based on the same idea, Zhang et al. 6 also used gradient knowledge to reconstruct HR images. In 7 , the authors proposed a reconstruction-based method in which non-local mean features and handcrafted features such as Gabor wavelet and sparse domain selection-based features were used for HR image reconstruction. Protter et al. 8 used the gradient profile method for the reconstruction of a sequence of images such as videos. These methods are also applied to medical images as reported in 9,10 . By using the gradient information, these methods outperformed interpolation methods to map LR images to HR images. The limitation of these methods is that one must design and use robust filters to extract gradient information. If the LR images have no clear gradient, then the performance of the SR method will be heavily affected.
The learning-based methods have recently attracted more attention for image super-resolution. For example, Freedman et al. 11 and Yang et al. 12 extracted highly localized patches from the input LR image and defined a mapping function to map these LR patches to an HR image. In 13 , the authors used a random forest classifier to classify the image space into multiple subspaces and then used a regression model to map the extracted patches to HR patches. The above-discussed methods are some of the traditional machine learning-based SR methods.
Deep learning-based methods. Recently, deep learning-based approaches have been explored for image super-resolution. These approaches have outperformed many traditional methods. In 14 , a feed-forward convolutional neural network (SRCNN) was used for feature extraction from LR images. The features were then up-sampled using the bilinear interpolation method. The architecture works in an end-to-end manner. SRCNN reported better results compared to the traditional methods. Kim et al. 15 used deep recursive CNN for image super-resolution to extract complex features. The method is divided into three parts: the embedding network used for basic features extraction, the inference network used for deep feature extraction, and the reconstruction layer that maps back the features to a HR image. The deep recursive approach is also used in 16 and 17 . With such deep networks, the vanishing gradient problem is a major concern. A dense skip connection was used by 18 to overcome the problem of vanishing gradient.
For up-sampling in CNNs, the sub-pixel convolutional layer was introduced in 19 , and an enhanced sub-pixel convolutional network was proposed in 20 . The sub-pixel up-sampling layer outperformed both interpolationbased up-sampling and transpose convolution-based upsampling methods. Zhang et al. 21 proposed a channel attention mechanism to improve the performance of image SR methods. The authors used local receptive fields to incorporate channel-dependent features. The limitation of the method is that it might miss some non-local features. Dai et al. 22 used both local and non-local receptive fields for feature extraction to handle this problem. The performance of the super-resolution method relies on the robustness of the extracted features. Due to better performance, the ResNet34 23 architecture was adopted by authors in 24,25,25 . Using the same idea, Ahn et al. 26 used a cascade of the residual network blocks.
To extract robust features, Lim et al, in 27 used ResNet-based architecture for feature extraction. In addition, Lim et al. 27 modified the traditional ResNet architecture by removing the batch normalization layer after each convolutional layer and proposed enhanced deep super-resolution architecture (EDSR). The limitation of the above methods is that all these methods only focus on the high peak-signal-to-noise ratio (PSNR). To obtain high PSNR, the authors only used mean square error (MSE) or mean absolute error (MAE) losses; however, the two losses make the resultant image more smooth and do not preserve the fine contents of the image. Pourya et al. 28 has proposed a progressive dilated convolution neural network for SR. The proposed method has shown superior performance compared to SRCNN and other existing models on natural images while utilizing less computational resources.
Among deep learning-based super-resolution methods, Generative Adversarial Networks (GAN) based methods have demonstrated significant improvement 29,30 . To overcome the limitations of CNN-based SR methods, the GAN architecture uses the perceptual or content based loss function, i.e., the VGG19 loss. The first GANbased SR method, called SRGAN, was proposed in 31 . The generator of the SRGAN is a ResNet34 architecture that extracts features from LR images. For up-sampling, a sub-pixel convolution layer was used. This method was further modified by Wang et al. 32 . They used multi-residual networks, i.e., residual-in-residual dense blocks. In addition, the authors eliminated the batch normalization layer after every convolutional layer. The elimination of the batch normalization layer improved the performance of traditional SRGAN architecture. Using multiple residual blocks or nested residual blocks increases computational complexity. Jiang et al. 33 proposed edgeenhanced GAN for super-resolution of satellite images. The architecture comprises two networks: ultra-dense subnetwork (UDSN), which extracts features and obtains high resolution features having sharp edges, and edgeenhanced sub-network (EESN), which enhances the extracted sharp edges and removes artifacts produced during the UDSN feature extraction process. In 34 , the authors used a GAN and a saliency map to generate retinal fundus HR images. The effect of saliency map is used to minimize the GAN cost function, which effectively improves the performance of the generation of HR images using GAN. The limitation of this method is that, to improve the neural network's performance, saliency maps are extracted separately, and efficient filters should be used to obtain robust saliency maps. Chen et al. 35 proposed a multilevel densely connected SR network (mDCSRN) for brain structural MRI images super-resolution. The authors reported good accuracy with the proposed network and reported a sixfold increase in speed compared to other GAN-based models. MedSRGAN is proposed in 36  www.nature.com/scientificreports/ Multiple residual-in-residual (RIR) blocks are used for feature extraction and sub-pixel convolutional layer for 4x upscaling. The discriminator network in this work takes pair of images, i.e. (LR, HR) images pair and (LR, SR) images pair, to discriminate between a generated HR image and an original HR image. The limitation of the above-mentioned GAN-based methods is that they use very deep networks on a single scale, i.e., filter size three, which may miss some of the large-scale features. Furthermore, these methods perform the up-sampling in a single step rather than up-sampling progressively.
In this paper, we propose a GAN-based image super-resolution method for medical images. In the proposed method, we use SRGAN 31 as a baseline and introduce significant modifications to its generator architecture. In 31 , the authors used ResNet34 architecture in the generator to extract single-scale features from LR images and then used a sub-pixel convolutional layer for 4x upscaling. In our work, we modify the structure of the generator network to extract shallow features on different scales. We evaluate our model on publicly available medical imaging datasets, i.e., retinal images datasets (STARE and DRIVE) 37,38 , ISIC skin lesion segmentation dataset 39 , BraTS 2018 dataset of brain MRI 40 , and the CAMUS dataset of cardiac ultrasound images 41 . The main contributions of this work are: • We present a new GAN-based super-resolution model for medical images. The model extracts shallow features on different scales, i.e., filter sizes 3, 5, and 7. In other GAN-based networks 31,36 , the authors extracted only single-scale features, i.e., filter size 3. • Unlike other GAN-based models, our model splits the process of upscaling into two steps. In the first step, shallow features are extracted, followed by deep features extraction using ResNet34 architecture, and then 2x upscaling is performed. In the second step, more complex features are calculated from the 2x upscaled version, and then 2x upscaling is performed again. This progressive upscaling helps in preserving fine details at each scale. • In our model, we add one more loss function, i.e., mean absolute error (L1 loss), to the loss functions of SRGAN 31 , which in turn results in performance improvement. • The super-resolution performance for our proposed method using multi-scaled features, progressive upsampling with added loss function, outperforms the current state-of-the-art SR methods on multiple medical imaging modalities.

Methods
Our architecture is a multi-path and progressive upscaling GAN. In the following text, we present the generator architecture, the discriminator architecture and the loss functions.
Generator network. To overcome the shortcomings and improve the accuracy of the SRGAN 31 for medical images, we propose a novel generator network with the following three major changes. The block diagram of the proposed architecture illustrating the changes, is shown in Fig. 1. The major components are: • Shallow features extraction. Shallow features extraction. In this part of the architecture, we use the LR version of the original HR image as an input to the architecture. We calculate basic or shallow features on three different scales by using kernels of sizes 3, 5, and 7, respectively. For feature extraction of each scale, we use two blocks. Each block consists of two convolutional layers, the first convolutional layer in each branch is followed by the batch normalization layer and ReLU activation. In comparison, the second convolutional layer is followed by the batch normalization layer. The number of channels for each pair of convolutional layers is 64. A skip connection is used between the output of the 1st block and the 2nd block. Using the skip connections, the features of the 1st block are added elementwise with the features of the 2nd block. After the feature extraction on different scales, we concatenate (channel-wise) all the three scales features into a single feature vector forming a 192 channels wide features vector as shown in Fig. 2. This feature vector contains the basic or shallow features of the LR image, which will be used as the input to the next step of the proposed architecture. The parameters of this network are shown in Table 1. The extraction of basic or shallow features on three different scales helps in the preservation of the fine details that are very important in medical images. For example, in retinal vessel images, the tiny vessels are important. Hence, it is  www.nature.com/scientificreports/ important to preserve these tiny vessels' features and map them to HR features. Similarly, for tumor detection in brain MRI, the structure, i.e., edges of the tumor, are important information that should be preserved by preserving the information of tiny edges while mapping LR images to HR images. Therefore, the purpose of using three different scales for basic features extraction is to preserve the fine details on each scale.
Deep features extraction and 2x upscaling. In traditional SRGAN 31 , the authors used ResNet34 architecture for deep features extraction with a single pre-residual convolutional layer. Furthermore, each convolutional layer is 64 channels wide. In comparison, we use 192 channels in each convolutional layer. The reason behind using 192 channels in each layer is that we are using the 192 channels wide concatenated output of step 1 as an input in this step of the architecture. As shown in Fig. 3, our ResNet34 architecture consists of 17 blocks, where each block has two convolutional layers. The first convolutional layer is followed by the batch normalization layer and ReLU activation, whereas a batch normalization layer follows the second convolutional layer. The features of each block are added elementwise with the features of the previous block. In the last layer of the ResNet34 architecture, we used kernel size 3 instead of kernel size 9 to overcome the blur effect and preserve more features. In addition, the last layer is added elementwise with the feature map of part 1, as shown in Fig. 3. After deep features extraction, we upscale the extracted features using a sub-pixel convolutional layer. As a result, we have 2x upscaled features of the LR image, which will be used as the input in step 3 of the architecture. The parameters used in this network are listed in Table 2.
Features extraction of upscaled version and 2x upscaling. At this step, we have a 2x upscaled version of an LR image. In this step, we calculate features of the 2x upscaled version. We use a mini-network comprising three residual blocks, as shown in Fig. 4. Each residual block has two convolutional layers. The first convolutional layer is followed by the batch normalizationlayer and ReLU activation, whereas a batch normalization layer follows the second convolutional layer. At the end of the mini residual network, a 192-channel wide convolutional layer followed by the batch normalization layer is used. The features map obtained after 2x upscaling in step 2 is elementwise added with the feature map extracted using the mini-network as shown in Fig. 4. This feature map is then upscaled 2x using a sub-pixel convolutional layer. At the end of step 3, we have a 4x upscaled feature map of the LR image. In the end, a convolutional reconstruction layer is used to map back the 4x upscaled features to the HR image. The parameters used in the mini-network are listed in Table 3. In 31,36 , the authors upscaled the extracted feature map in a single step at the end of the architecture. By upscaling the features in a single step, artifacts may appear in the resultant image due to predicting the wrong information four times in a single step. In our work, we split the process of upscaling into two steps. We extracted the feature of LR image, 2x upscaled the feature map, and rather than again upscaling 2x in the same step, we extracted features of 2x upscaled version and again 2x upscaled the feature map. The significance of this change over other GAN-based methods is discussed in the result section.
Discriminator network. In this work, we also modify the traditional discriminator network of SRGAN 31 .
In 31 the discriminator network consists of 8 convolutional layers, having the number of kernels increasing from 64 to 512, and a single dense layer at the end. Each convolutional layer is followed by the batch normalization layer and ReLU layer. As our generator model is a deep network, hence it requires a strong discriminator in competition, therefore, we propose important modifications in the traditional discriminator. We add more convolutional layers to the discriminator by first increasing the number of kernels in each layer up to 2048 and then reducing the number of kernels progressively down to 512 (see Fig. 5). Thereafter, we have added three additional layers having the number of kernels equal to 128, 256, and 512, respectively. We have elementwise added the output prior to these three layers, and the output of the last 512 kernels layer as shown in Fig. 5. Using skip connection helps to eliminate the vanishing gradient problem.    Fig. 1).

Parameters Values
No. of blocks 17 No. of kernels in each layer 192 Kernel size 3 × 3

Activation function ReLU
Upscale layer Sub-pixel layer Scaling factor 2 Table 3. Parameters for the mini-network shown in Fig. 4.

Parameters Values
No. of blocks 3 No. of kernels in each layer 64 Kernel size 3 × 3 where D(Ī) is the output of the discriminator.
where I i,j,k is the original image and Ī i,j,k is the generated image. The variables h, w, and c represent the height, width and the number of channels.
In our model, we use all the three losses, along with the absolute difference error (L1 loss) as shown in Eq. (4), which in turn improves the peak signal to noise ratio. L1 loss has the limitation of over smoothing results. This is addressed by using L2 loss along with L1 loss.
where I i,j,k is the original image and Ī i,j,k is the generated image. The variables h, w, and c represent the height, width and the number of channels. The overall loss is then calculated as in Eq. (5):  Cardiac ultrasound images dataset. The CAMUS (Cardiac Acquisition for Multi-structure Ultrasound Segmentation) dataset 41 contains cardiac 2-chamber and 4-chamber 2D ultrasound images. The dataset is publicly available and contains 500 cardiac ultrasound images. Each image has a different resolution, and all of them are larger than 1024×512. Therefore, each image is re-scaled to a resolution of 1024×512 for use as ground truth. To obtain the LR version, each image is downsampled 4 times to 256×128 resolution. For the CAMUS dataset, we use 400 images for training and 100 images for testing, and a batch size of 2.

Results and discussion
Evaluation metrics. The following evaluation measures were used to compare the performance of the proposed work with other super-resolution methods.
Peak signal-to-noise ratio. Peak signal-to-noise ratio (PSNR) is used to measure the quality of the reconstructed image. PSNR is the ratio of the maximum possible power of a signal to the noise that affects the quality of the signal. Mathematically, PSNR is shown in Eq. (6).
where, I is the image and MSE is mean square error (MSE). The lower the MSE, the higher will be PSNR. MSE is shown in Eq. (7).
where, I(x, y) is the original image and Ī (x, y) is generated image. The variables h and w represent the height and width of the image. Results on retinal images. The PSNR and SSIM scores of the proposed architecture are compared with the traditional SRGAN 31 and Bicubic interpolation method. Table 4 shows the score of PSNR and SSIM on the retinal images' dataset for each method. For the STARE dataset, our method improves the PSNR score by 4.95 dB compared to traditional SRGAN and by 11.52 dB compared to the Bicubic interpolation method. SSIM score is increased by 4 percent compared to traditional SRGAN and 6 percent compared to the Bicubic interpolation method. The experiment is repeated ten times to find the standard deviation. Similar improvement is shown while testing on the DRIVE dataset. The SR results on the STARE dataset are shown in Fig. 6. In Fig. 6, each row shows the result for a single image. The LR image, the interpolation-based SR results, SRGAN 31 results, proposed architecture results, and the corresponding ground truth HR images are shown in Fig. 6a-e, respectively. The green arrow shows the area where there exists the degradation caused by each method. The interpolation-based method degrades the structure by making it very blurry, increasing the mean square error. The SRGAN 31 method removes the blurriness in the image but adds a lot of noise which can be seen near the optic disk. In comparison, the proposed architecture preserves the structure of the vessels and removes the noise making the image smooth. In Figs. 7 and 8, we have extracted small regions from two images of the DRIVE dataset to show the image quality degradation caused by each algorithm. The proposed architecture produces more realistic HR images compared to the other two methods.
Results on skin images. The PSNR and SSIM scores for the skin dataset are shown in Table 5. The proposed architecture improved PSNR by 3.11 dB and SSIM by 4 percent compared to SRGAN, and PSNR by 9.27 dB, and SSIM by 2 percent compared to the bicubic interpolation. The experiment is repeated ten times for each method to report the standard deviation. In Figs. 9 and 10, we have extracted small regions from resultant images to show the qualitative improvement of our model. As shown in Figs. 9b and 10b, the SRGAN does not preserve colors correctly, and hence SSIM is heavily affected, while the bicubic method adds artifacts and blurriness in the resultant image (see Fig. 9a). The output of the proposed architecture is shown in Figs. 9c and 10c, which are qualitatively very similar to the ground truth HR images in Figs. 9d and 10d, respectively.
Results on brain MRI images. The PSNR and SSIM scores for the BraTS dataset are shown in Table 6. The proposed method improved PSNR and SSIM by 9.31 dB and 7 percent, respectively, compared to bicubic interpolation and 6.35 dB and 5 percent, respectively, compared to the SRGAN method. The visual results of each method are shown in Fig. 11. Each row contains the SR results of a single image. The arrow shows the degradation caused by each method. It can be seen from Fig. 11b that the interpolation-based method produces blurry results. On the other hand, the SRGAN 31 does not preserve the actual structure and adds noise to the resultant image, as shown in Fig. 11c. In comparison, the proposed architecture in Fig. 11d generates SR output images that are very close to the images in Fig. 11e, i.e., corresponding HR images. In Fig. 12, we have extracted a small region from an image and compared each method. These figures show that the proposed method generates more realistic HR images than SRGAN 31 and the interpolation-based method.
Furthermore, in Results on ultrasound images. The PSNR and SSIM scores for the CAMUS dataset of cardiac ultrasound images are shown in Table 8. The proposed architecture improved PSNR by 1.65 dB and SSIM by 2 percent compared to SRGAN, PSNR by 3.9 dB, and SSIM by 6 percent compared to bicubic interpolation. The experiment is repeated ten times to obtain the standard deviation. To better visualize the qualitative improvement of the proposed model, we have extracted small regions from the resultant images to identify the qualitative improvement of the proposed model, as shown in Fig. 13. The bicubic interpolation output shown in Fig. 13a has blur effects, and the original details are not preserved. Similarly, in Fig. 13b, the output of SRGAN has a dark shade though less explicit, compared to Fig. 13d, i.e., ground truth HR image. Finally, the output of the proposed architecture is shown in Fig. 13c, which is very similar to the ground truth HR image in Fig. 13d. www.nature.com/scientificreports/ Results on noisy data. The proposed model is also evaluated using noisy brain MRI images. Gaussian noise with zero mean and variance of 0.01 is added to the brain MRI images. The SSIM and PSNR scores for both SRGAN and the proposed methods used on noisy data are shown in Table 9. The proposed architecture improved SSIM by 4 percent and PSNR by 4.85 dB compared to SRGAN 31 . As shown in Fig. 14, the proposed method outperformed the SRGAN 31 method with a noticeable improvement. In Fig. 14, each row represents the results for a single image. To further identify the differences, small regions from the resultant images are shown in Fig. 15. The first row in Fig. 15 shows the LR image, the noisy LR, and the corresponding HR image. The SRGAN 31 method does not preserve the structure of the image and hence causes noticeable degradation as shown in Fig. 15a, while the proposed method produces better results on noisy images shown in Fig. 15b. he proposed method result on noisy image is very close to the result on non-noisy image shown in Fig. 15c and the ground truth HR image shown in Fig. 15d.

Results analysis.
The results on the four datasets show significant improvement of performance achieved by the proposed method compared to other SR methods, particularly the interpolation-based methods and the SRGAN 31 . In SRGAN 31 , the authors extracted basic or shallow features using a single scale, i.e., kernel size 3. However, the features of large scale are missed by kernel size 3. Therefore, in part 1 of our proposed architecture, we extracted the shallow features on three different scales, i.e., kernel size 3, 5, and 7. Subsequently, we have concatenated all features into a single feature vector, as shown in Fig. 2. The effect of this can be seen, for instance, in Figure 6. Results of super resolution for different images of STARE dataset. The column (a) has LR images (128×128 but scaled for uniform display), column (b) shows Bicubic interpolation SR result, column c shows output HR images for SRGAN 31 , column d shows output HR images for the proposed architecture, column e shows ground truth HR images (512×512). Each row illustrates results of a single image. The corresponding (SSIM/PSNR) scores are mentioned below each image. Green arrows illustrate the degradation or improvement caused by each algorithm with respect to the ground truth HR image. www.nature.com/scientificreports/ Fig. 12b, where the SRGAN missed the important features, while in Fig. 12c, the proposed architecture preserves the correct structure of the image while generating the HR image. Furthermore, the second important proposed change in the architecture of SRGAN 31 is that we have used the idea of progressive upscaling. In SRGAN 31 , the authors performed upscaling in a single step, i.e., after the feature extraction. The limitation of performing upscaling is that it does not generate true colors while generating an HR image. In our proposed work, we have extracted features of an LR image, then performed 2x upscaling, and extracted the features of the upscaled version, and again performed 2x upscaling. The advantage of this change can be seen in Figs. 9 and 10. In Figs. 9b and 10b, the SRGAN architecture does not preserve the actual colors in the generated HR images, while in Figs. 9c and 10c, the proposed architecture generates realistic colors very similar to the ground truth HR images in Figs. 9d and 10d.
The third proposed change is that we have added an extra loss function to the loss functions used in SRGAN 31 . We have used L1 loss, i.e., mean absolute error given in Eq. (4). The overall loss function used in the proposed architecture is shown in Eq. (5). L1 loss penalizes large errors. Therefore, using L1 loss in addition to the other three losses generates more realistic and smooth images. The significance of using L1 loss can be seen in Figs. 7 and 8. As shown in Figs. 7b and 8b, the SRGAN generates the HR image with noise in it. The noise can be seen around the vessel in Fig. 7b and near the optic disc in Fig. 8b. In Figs. 7c and 8c, the proposed architecture generates smoother HR images similar to the corresponding HR image shown in Figs. 7d and 8d, respectively.
The results presented in this work are obtained on an Nvidia GTX 1080 Ti GPU and the code is implemented in Tensorflow 2.0.  an ablation study on the retinal images from the DRIVE dataset. We performed ablation using three different steps. In the first step, we made changes to the feature extraction module. In this step, we extracted features on a single scale only, i.e., using kernel size 3. By doing so, we recorded that the SSIM and PSNR scores were reduced by 3.2 dB and 3 percent, respectively. In the second step, we made changes to the mini-network module of the proposed method. We removed the mini-network between the two subpixel convolution layers in this step. By doing so, we observed a 4.62 dB decrease in the PSNR and a 5 percent decrease in SSIM score. In the third step, we removed the additional error function, i.e., the absolute difference error of Eq. (4). By removing the addi-   www.nature.com/scientificreports/ tional loss function, the PSNR and SSIM scores decreased by 6.58 dB and 8 percent, respectively. The scores for each experiment of the ablation study are shown in Table 10. To visualize the difference, we have extracted small regions from the resultant images as shown in Fig. 16. From Fig. 16a, it is clear that the experiment using a single scale features extraction resulted in a low-quality image. The degraded result for the second experiment "without mini-network" is shown in Fig. 16b. Figure 16c shows the resultant image of the third experiment, i.e., by removing the absolute difference error term from the loss function. The noise is clearly seen in the small extracted patch. The result of the proposed method incorporating all the modules of the proposed method is shown in Fig. 16d, which produces a better SR image that is very close to the HR ground truth image shown in Fig. 16e.

Conclusion
In this work, we have proposed a new GAN-based SR method that outperforms other state-of-the-art superresolution methods on different medical imaging modalities. We have used the concept of a multipath learning network to extract features at different scales and a progressive upscaling network to prevent the artifacts in the generated HR image. Furthermore, we have used L1 loss (mean absolute error) (mean absolute error) along with the the losses used in SRGAN, which generates more realistic and smooth HR images. We have evaluated the performance of the traditional SRGAN 31 , interpolation bases method (bicubic interpolation), and our proposed architecture on four publicly available medical imaging datasets. Experimental results on medical imaging    www.nature.com/scientificreports/  www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.